Add Book to My BookshelfPurchase This Book Online

Chapter 8 - Troubleshooting

Cisco TCP/IP Routing Professional Reference
Chris Lewis
  Copyright © 1999 The McGraw-Hill Companies, Inc.

Troubleshooting Protocol Problems
In this section we'll cover two main areas. First, we will look at troubleshooting general IP connectivity related to routing protocol issues. Second, we'll examine basic troubleshooting for popular WAN protocols. In both of these sections we will consider troubleshooting lost connectivity and poor performance.
Troubleshooting IP Connectivity
The usual way to determine if a device is reachable via the IP protocol on an internetwork is by using the ping utility. Ping sends an ICMP packet from the source to the specified destination. If successful, the returned ping packet proves that all Physical, Data Link, and Network layer functions are operating correctly from the source to the destination. For the purposes of this discussion, failed IP connectivity will mean that a ping packet does not get a reply. It will be assumed that all physical connections are in place and all interfaces and line protocols are up along the path from source to destination. Initially, the discussion focuses on distance vector protocols; a section on OSPF follows.
Troubleshooting Distance Vector Protocols.     If a ping is failing in an internetwork with good Physical and Data Link connections, the first place to look are the routing tables of all routing devices between the source and destination. Each one in turn should have an entry that states the next hop for the ultimate destination network number. For example, if you are trying to ping a device with an IP address of 164.7.8.33 and the netmask in use on your internetwork is 255.255.255.224, then you would expect to see an entry for the subnet 164.7.8.32 in all the routing tables between the ping source and its destination.
This simple statement, although neat, might not be always true. Suppose the device from which you are sending the ping is on the 170.5.0.0 network and, as routes are summarized at the boundary between major network numbers, each of the devices on this network will have only one entry in its routing table for the 164.7.0.0 network. Once the path from source to destination takes you from the 170.5.0.0 network into the 164.7.0.0 network, you expect to see entries for the 164.7.32.0 subnet. In addition, you expect to see entries in the routing table of all the devices from the destination to the source, for the source's subnet number, so that the ping reply can find its way back to the source.
So what do we do if one of the devices is missing a routing table entry? The first thing is to check that each device has a routing protocol appropriately enabled. This means that all devices need a routing protocol enabled for the same autonomous system number, and network entries need to be configured for each network that is directly attached to a router.
By viewing the configuration of all the routing devices from source to destination, you confirm that all devices are in fact correctly configured for a routing protocol. What next? You probably want to use some debug commands, starting with debug ip igrp events, which tells you which hosts sent IGRP information and to which hosts IGRP information was sent. The debug ip igrp transactions command details the content of the IGRP updates received and sent in terms of the network numbers and their associated metrics. In a network using RIP, debug ip rip tells you the content of routing updates and which routers they are sent to and received from. Figure 8-12 provides sample outputs from these commands.
LOG OUTPUT OF DEBUG IP IGRP EVENTS COMMAND
IGRP: received update from invalid source 193.1.1.1 on Ethernet0
IGRP: received update from 164.7.1.98 on Serial0
IGRP: Update contains 0 interior, 1 system, and 0 exterior routes.
IGRP: Total routes in update: 1
IGRP: sending update to 255.255.255.255 via Ethernet0 (164.7.1.66)
IGRP: Update contains 1 interior, 1 system, and 0 exterior routes.
IGRP: Total routes in update: 2
IGRP: sending update to 255.255.255.255 via Serial0 (164.7.1.97)
IGRP: Update contains 1 interior, 0 system, and 0 exterior routes.
IGRP: Total routes in update: 1
The first line is interesting as it identifies an IGRP update that appeared on this interface from a source on a different network number and is therefore not accepted on this interface. The rest of the display identifies valid updates received and sent via broadcast.
LOG OUTPUT OF DEBUG IP IGRP TRANSACTIONS COMMAND
IGRP: sending update to 255.255.255.255 via Ethernet0 (164.7.1.66)
subnet 164.7.1.96, metric=8476
network 193.1.1.0, metric=8576
IGRP: sending update to 255.255.255.255 via Serial0 (164.7.1.97) 
subnet 164.7.1.64, metric=1100
IGRP: received update from invalid source 193.1.1.1 on Ethernet0
IGRP: received update from 164.7.1.98 on Serial0
network 193.1.1.0, metric 8576 (neighbor 1100)
This command logs more detailed information on each routing update received and sent.
LOG OUTPUT OF DEBUG IP RIP COMMAND
RIP: sending update to 255.255.255.255 via Serial0 (164.7.1.97)
RIP: Update contains 1 routes
RIP: received update from invalid source 193.1.1.1 on Ethernet0
RIP: received update from 164.7.1.98 on Serial0
193.1.1.0 in 1 hops
RIP: sending update to 255.255.255.255 via Ethernet0 (164.7.1.66)
subnet 164.7.1.96, metric 1
network 193.1.1.0, metric 2
This log output shows the detail of RIP updates sent and received, including the interface received on and route metrics.
Figure 8-12: Control statements and the number of paths they generate
If the Physical and Data Link layer connections between the source and destination are good and the routing protocols are correctly configured but appropriate routes still do not appear in all the routing tables necessary, the output from these commands should give a clue as to where the route information is being dropped. The route information might be dropped because of passive interfaces, incorrectly configured redistribution, access lists, or distribute lists.
Assuming the output of the debug commands identifies where in the chain of routers from source to destination the required routing information is being dropped, you can examine the configuration of the suspect router to check for any passive interfaces that will not send out any routing updates, or those that have a distribute list applied. Distribute lists are useful if you want to reduce the size of routing updates sent. A distribute list works with a defined access list to identify the routes that will be advertised and the routes that will not. An example configuration of this combination is given here, allowing updates only from the 164.7.0.0 network number for the IGRP protocol.
access-list 1 permit 164.7.0.0
router igrp 11
network 164.7.0.0
distribute-list 1 out
If there are no impediments to routing information by passive interfaces or distribution lists, incorrectly configured redistribution could be causing route update problems. If you suspect redistribution, the first thing to check is the default metric configured for redistributed routes. If this is missing or using a value that makes subsequent routers discard the route information, an adjustment in its value is necessary.
Having covered typical problems with Network layer configuration that result in lost connectivity, what can we do if two nodes can communicate, but there is a severe performance problem? We first would look at Physical and Data Link issues, such as performance of leased lines, framing or line errors, buffers, and hold queues. If checking these areas fails to cover any problems, we can examine what is happening at the Network layer to cause slow performance. The best tool for diagnosing performance problems that are suspected to being due to Network layer operation is the Cisco trace command. An example of the use of trace is given in Fig. 8-13.
Router1>trace ip 164.7.1.97
Type escape sequence to abort.
Tracing the route to 164.7.1.97
1 164.7.1.66 4 msec 4 msec*
Router1>trace ip 164.7.1.98
Type escape sequence to abort.
Tracing the route to 164.7.1.98
1 164.7.1.66 4 msec 4 msec 4 msec
2 164.7.1.98 20 msec 20 msec*
Router1>trace ip 164.7.1.99
Type escape sequence to abort.
Tracing the route to 164.7.1.99
1 164.7.1.66 4 msec 4 msec 4 msec
2 164.7.1.98 16 msec 20 msec 16 msec
3 164.7.1.97 16 msec 16 msec 16 msec
4 164.7.1.98 32 msec 32 msec 32 msec
5 164.7.1.97 32 msecRouter1 >
Figure 8-13: Use of the trace command
In the first part of the display, we see how trace reports a successful ping from router 1 to router 3 through router 2 in the lab setup of three routers we have used throughout the book. The trace reports the path that is taken from source to destination. The second part of Fig. 8-13 shows an unsuccessful trace. Here, the trace command reports that the router can find the subnet where the destination IP address should be located, but it cannot find the host on the subnet, so it queries all devices it knows about on this subnet to see if they know about the target address. This display will continue indefinitely until it is stopped by the break sequence (pressing the Ctrl and 6 keys simultaneously).
With internetworks that are either poorly designed, or which are experiencing a fault condition, the paths selected for routes can become suboptimal, leading to poor performance. A typical example of this is if a bridge or repeater is connected through the internetwork to two interfaces on a router. This will lead to a router receiving duplicate routing updates—effectively receiving the same updates on two interfaces. This is a condition that a router cannot deal with effectively, and unpredictable routing decisions result. The trace command output can identify these types of problems by reporting the path packets take from source to destination. If a suboptimal path is taken, the interconnections made on the internetwork need to be examined to resolve any conditions that the routing protocols cannot handle.
Overview of OSPF Troubleshooting.     OSPF and other link state protocols are more complex to troubleshoot than distance vector protocols. The problem we will consider initially is that of a ping packet that is not returned successfully from a remote host. Assuming that the Physical and Data Link layers have checked out and that all devices have OSPF enabled for the same autonomous system number, troubleshooting an OSPF internetwork starts off in the same way as for an IGRP internetwork. The first task is to review the routing table entries, because each routing device from source to destination must have routing table entries that enable a packet to be routed in both directions for a ping request to be successful.
Assuming that the ping fails because of a lack of routing table entries, we would look first for any passive interfaces or distribute lists stopping the route information from being disseminated. If none exists, the OSPF configuration for each router device must be reviewed. All interfaces that are to participate in OSPF routing need to have the network numbers to which they belong listed in the network commands that are entered as subcommands under the OSPF major command.
You can check whether OSPF is running on all expected interfaces by issuing the show ip ospf interface command for each interface. (A sample display was illustrated in Fig. 4-17.) Obviously, any interface that is not reporting OSPF information for this command is incorrectly configured and needs to be investigated.
Another useful command for troubleshooting at this level is the show ip ospf neighbor command, illustrated in Fig. 8-14, which identifies all the neighbors the router knows about via the OSPF protocol. If a router that you expect to see on this list does not show up, further investigation into the missing router's configuration is required.
Router1#show ip ospf neighbor
ID
Pri
State
Dead Time
Address
Interface
193.1.1.137
1
FULL/DR
0:00:31
160.8.8.3
Ethernet0
193.3.4.1
1
FULL/DROTHER
0:00:33
160.3.48.1
Serial0
192.1.8.2
1
FULL/DROTHER
0:00:33
160.3.48.20
Serial0
193.1.1.1
5
5
FULL/DR
0:00:33
160.3.48.18
Serial0
Router1#show ip ospf neighbor 193.1.1.37
Neighbor 193.1.1.37, interface address 160.8.8.3
In the area 0.0.0.0 via interface Ethernet0
Neighbor priority is 1, State is FULL
Options 2
Dead timer due in 0:00:32
Link State retransmission due in 0:00:04
Neighbor 193.1.1.37, interface address 192.31.48.189
In the area 0.0.0.0 via interface Seriali0
Neighbor priority is 5, State is FULL
Options 2
Dead timer due in 0:00:32
Link State retransmission due in 0:00:03
The ID is the router ID of the OSPF neighbor, Pri is the priority of this router that affects it being chosen as a designated router, Address is the source address of the interface that advertised this router, which was received through the interface listed.
Figure 8-14: Output of the show IP OSPF neighbor command
If all the routers appear in the show ip ospf neighbor and all interfaces are enabled for OSPF, but you still cannot ping the desired host, check to make sure that each OSPF area has at least one border router and that border router is connected to area 0. The only way to check this is by viewing the configuration of the border router. This is an important configuration requirement for OSPF internetworks, as all interarea communication has to go through area 0.
The last consideration is that of mismatched hello and dead timers. These timers can be viewed in the show ip ospf interface display. The value for these timers should be the same for all interfaces. If mismatched values are found, they can be altered in interface configuration mode by the ip ospf dead-interval and ip ospf hello-interval commands.
Troubleshooting Packet-Oriented WAN Protocols
This section will provide the essential information for initial troubleshooting of the packet-oriented WAN protocols, frame relay and X.25. The focus of this section is to explore why nodes might not be communicating for each of the protocols considered. Typically, there is more in the router configuration of an X.25 connection that can cause intermittent or poor performance than in a frame relay connection.
A frame relay connection is normally configured for connection to a public network, and performance issues with this type of connection are generally linked to the public network itself, or to Physical layer issues such as noisy lines or router buffer problems. Troubleshooting noisy lines and router buffer problems for frame relay follow the same process as described previously. We shall therefore look mainly at what can cause connectivity to fail in a frame relay environment. With X.25, we will look at router issues that can contribute to poor performance as well as to no connectivity.
Troubleshooting Lack of Connectivity over Frame Relay.     The most usual configuration for connecting a router to a public frame relay network is for the public frame relay network to send data link connection identifier (DLCI) information to the router via an agreed LMI interface type. Again assuming that everything at the Physical and Data Link layers are working and that the show interface serial command reports an up condition for both interface and line protocol status, we will first want to see if LMI information is being received and sent.
The place to start is with the show frame-relay map command, as shown in Fig. 6-8, which will tell you if the router has successfully learned of the remote device protocol IDs. If this process has failed, there will be no entries in the display of this command. The process that should take place is that the local management interface (LMI) informs the router of the available DLCI numbers and the router uses inverse ARP to determine the protocol address of the devices at the other end of the PVCs identified by the DLCIs.
If a router is not registering the available DLCIs on its frame relay connection (which can be determined by issuing the show frame-relay PVC command as illustrated in Fig. 6-8), you should determine whether the switch is sending the information via the LMI. In this situation, the debug frame-relay lmi command should be used. If the frame relay switch is sending DLCI numbers via the LMI, they will be listed in the output of this command. If this debug command lists DLCI numbers being sent by the frame relay switch that are not shown in the show frame relay map command, the LMI type used by the router should be confirmed as correct. If no DLCI numbers are listed, you need to contact the company supplying the frame relay connection to have the frame relay switch send the correct data.
This covers LMI operation; checking whether Inverse ARP worked is a little more tricky. If the router learns about its DLCI numbers, but does not establish entries in its show frame-relay map command output, there are two possibilities. Either the two nodes communicating over the frame relay network are not configured to send broadcast routing updates, or the frame relay network is improperly configured. Chapter 6 covered setup of broadcast IGRP updates over frame relay links, and I will not repeat that here. If you find a situation in which the router knows of its DLCI numbers, and you are sure that all connected devices are set up for broadcast, and that IGRP, or some other appropriate routing protocol, is properly enabled on the connected devices, you have to inform the frame relay provider that Inverse ARP is not working over the network and seek help. In the meantime, static maps can be entered into the router with the frame-relay map command.
Troubleshooting Lack of Connectivity over X.25.     There are many similarities between a frame relay link and an X.25 link, such as the use of packet switching, PVC and SVC allocation, and support for multiple logical connections being established over a single physical connection. The differences between the two technologies are significant enough, however, to justify different troubleshooting procedures.
In frame relay, the DLCI number provides the key to delivering traffic on a frame relay connection. In X.25, the X.121 address is the key addressing element. A DLCI and an X.121 address are very different things. The DLCI has only local significance, so the same DLCI number can be assigned at both ends of the link and it will still work. The X.121 address, by contrast, has significance throughout the X.25 network and is an address that can be used to identify a single host on the network. Also, X.25 does not have the same Inverse ARP capabilities, so the router's configuration must be filled with all the necessary X.25 map statements to map IP to X.25 addresses.
Having established that X.25 is a very different type of packet switching technology to frame relay, let's consider how to resolve X.25 issues that can result in no connectivity across a network, then those that result in poor connectivity.
The case that we will use for this discussion is of two IP networks interconnected via an X.25 network, which is similar to the configuration used in the discussion on configuring X.25 interfaces in Chap. 6.
Assuming that all the Physical layer issues having to do with leased lines and cables are operational but connectivity across the X.25 network is still not available, we need consider what at the X.25 level can stop communication. The first thing to do is determine that the X.121 addresses are correct, both for the address assigned to your X.25 interface and those used to address remote hosts in the X.25 map statements.
Next, check whether routing updates are getting from and to the remote locations. To do this, you must view the configuration of the routers and make sure that all the X.25 map statements include the keyword broadcast. This keyword ensures that IGRP or other routing protocol updates are transported over the X.25 network to all remote locations defined in the X.25 map statements.
The remaining issues that we shall consider can degrade performance, or in severe cases, deny connectivity altogether over an X.25 network.
The show interface serial command output for an interface with X.25 encapsulation (as illustrated in Chap. 6), lists frame reject (REJ), Receiver Not Ready (RNR), Frame Error (FRMR), line disconnects (DISC) and protocol restart (RESTART) values that should all be low, by which I mean less than 0.5 percent of the number of information frames (IFRAME). If any of these values is greater than this 0.5 percent number, there is a problem either at the Physical level with the hardware and leased lines, or there is a configuration mismatch. With X.25 connections, you need to be concerned about matching the configuration of many variables for the connected devices at both the LAPB and X.25 level to ensure optimum communication. Table 8.2 lists the ID of the variable as reported in show interface serial and show x25 vc commands, a description of this variable, and the configuration command to change the variable.
Table 8.2: LAPB and X.25 Configuration Variables:
ID of the Variable
Description
Configuration Command
LAPB T1
Retransmission timer, or how long the router will wait for an acknowledgment before polling for a response
lapb t1 (value in milliseconds)
LAPB N1
Maximum bits per frame
lapb n1 (no. of bits)
LAPB N2
Number of retransmit attempts allowed before the link is declared down
lapb n2 (no. of tries)
LAPB k
LAPB window size, the maximum number of frames that can be transmitted before an acknowledgment is required
lapb k (number)
LAPB modulo
Frame numbering scheme, the maximum window size is the modulo less 1
lapb modulo (8 or 128)
channels: incoming, two-way, outgoing
The lowest and highest permissible incoming, outgoing, and two-way X.25 logical channel numbers
x25 lic, hic, ltc, htc, loc, hoc
x25 modulo
The packet sequence numbering scheme
x25 modulo (8 or 128)
window size input, output
The window size configured for X.25 inbound and outbound packets
x25 win, wout
packet size input, output
The maximum X.25 packet size

x25 ips, ops
x25 timers
and clear timers
T10–13 for a DCE and T20–23 for DTE, set the restart, call, reset,
x25 t10 t11, t12, t13, t20, t21, t22, t23
If you can verify cabling, hardware, and leased-line operation; have appropriate addresses and X.25 DTE/DCE configuration; and can verify that all of the variables listed in Table 8.1 are compatible between the two communication devices, there should be no reason to stop communication. If problems still exist, the last resort is to connect a serial line analyzer and see if one end is sending the SABM initialization sequence and the other is responding with UA frames. If all the troubleshooting activities listed here check out okay (i.e., that everything is functioning as it should), you should contact the X.25 network vendor and ask for assistance in resolving any other issues.
The outputs of the debuglapb and debug x25 commands provide extensive and in-depth analysis of the communication between X.25-connected devices. If you are going to spend considerable amounts of time with LAPB and X.25 communication problems, it is worth referring to the Cisco documentation or talking to a Cisco Systems engineer.

 


 
Books24x7.com, Inc © 2000 –  Feedback